ZIPF
Overview
The ZIPF function computes values for the Zipf distribution (also known as the zeta distribution), a discrete probability distribution that models rank-frequency relationships following a power law. The distribution is particularly important in linguistics, information theory, and the study of complex systems where rare events occur with surprising regularity.
The Zipf distribution is named after linguist George Kingsley Zipf, who observed that in natural language texts, the most common word appears approximately twice as often as the second most common word, three times as often as the third most common, and so on. This phenomenon, known as Zipf’s law, extends far beyond linguistics to describe city populations, website traffic, income distributions, and many other natural and social phenomena.
The probability mass function (PMF) for the Zipf distribution is defined as:
f(k, a) = \frac{1}{\zeta(a) \cdot k^a}
where k \geq 1 is the rank, a > 1 is the shape parameter that controls how quickly probabilities decay with rank, and \zeta(a) is the Riemann zeta function that serves as a normalization constant. This implementation uses SciPy’s zipf distribution, which provides the full suite of statistical functions including the cumulative distribution function (CDF), survival function (SF), and their inverses.
The shape parameter a determines the “steepness” of the distribution: larger values of a mean that higher ranks (less frequent items) have dramatically lower probabilities, while values closer to 1 (but still greater than 1) create a more gradual decline. The Zipf distribution is a special case of the more general Zipfian distribution and is mathematically related to the Pareto distribution through a transformation of variables.
This example function is provided as-is without any representation of accuracy.
Excel Usage
=ZIPF(k, a, zipf_mode, loc)
k(float, required): Value at which to evaluate the distribution. For icdf/isf, probability in [0, 1].a(float, required): Distribution shape parameter. Must be greater than 1.zipf_mode(str, optional, default: “pmf”): Calculation mode to use.loc(float, optional, default: 0): Location parameter that shifts the distribution.
Returns (float): Distribution result (float), or error message string.
Examples
Example 1: PMF at k=2 with a=2.5
Inputs:
| k | a |
|---|---|
| 2 | 2.5 |
Excel formula:
=ZIPF(2, 2.5)
Expected output:
0.1318
Example 2: CDF at k=2 with a=2.5
Inputs:
| k | a | zipf_mode |
|---|---|---|
| 2 | 2.5 | cdf |
Excel formula:
=ZIPF(2, 2.5, "cdf")
Expected output:
0.8772
Example 3: Survival function at k=2 with a=2.5
Inputs:
| k | a | zipf_mode |
|---|---|---|
| 2 | 2.5 | sf |
Excel formula:
=ZIPF(2, 2.5, "sf")
Expected output:
0.1228
Example 4: Inverse CDF at probability 0.5 with a=2.5
Inputs:
| k | a | zipf_mode | loc |
|---|---|---|---|
| 0.5 | 2.5 | icdf | 0 |
Excel formula:
=ZIPF(0.5, 2.5, "icdf", 0)
Expected output:
1
Python Code
from scipy.stats import zipf as scipy_zipf
import math
def zipf(k, a, zipf_mode='pmf', loc=0):
"""
Compute Zipf distribution values: PMF, CDF, SF, ICDF, ISF, mean, variance, std, or median.
See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.zipf.html
This example function is provided as-is without any representation of accuracy.
Args:
k (float): Value at which to evaluate the distribution. For icdf/isf, probability in [0, 1].
a (float): Distribution shape parameter. Must be greater than 1.
zipf_mode (str, optional): Calculation mode to use. Valid options: PMF, CDF, SF, ICDF, ISF, Mean, Var, Std, Median. Default is 'pmf'.
loc (float, optional): Location parameter that shifts the distribution. Default is 0.
Returns:
float: Distribution result (float), or error message string.
"""
# Validate a
try:
a_val = float(a)
if not (a_val > 1):
return "Invalid input: a must be > 1."
except Exception:
return "Invalid input: a must be a number."
# Validate loc
try:
loc_val = float(loc)
except Exception:
return "Invalid input: loc must be a number."
# Validate zipf_mode
valid_modes = {"pmf", "cdf", "sf", "icdf", "isf", "mean", "var", "std", "median"}
if not isinstance(zipf_mode, str) or zipf_mode not in valid_modes:
return f"Invalid input: zipf_mode must be one of {sorted(valid_modes)}."
# Helper to process k (scalar or 2D list)
def process_k(val):
try:
return float(val)
except Exception:
return None
# Helper to convert inf to string
def inf_to_str(val):
if isinstance(val, float) and math.isinf(val):
return "inf" if val > 0 else "-inf"
return val
# Handle mean/var/std/median modes (k parameter is ignored)
if zipf_mode == "mean":
try:
result = scipy_zipf.mean(a_val, loc=loc_val)
return inf_to_str(float(result))
except Exception as e:
return f"Error computing mean: {str(e)}"
if zipf_mode == "var":
try:
result = scipy_zipf.var(a_val, loc=loc_val)
return inf_to_str(float(result))
except Exception as e:
return f"Error computing variance: {str(e)}"
if zipf_mode == "std":
try:
result = scipy_zipf.std(a_val, loc=loc_val)
return inf_to_str(float(result))
except Exception as e:
return f"Error computing standard deviation: {str(e)}"
if zipf_mode == "median":
try:
result = scipy_zipf.median(a_val, loc=loc_val)
return inf_to_str(float(result))
except Exception as e:
return f"Error computing median: {str(e)}"
# PMF, CDF, SF, ICDF, ISF modes
def compute(val):
kval = process_k(val)
if kval is None:
return "Invalid input: k must be a number."
# Validate probability range for icdf/isf
if zipf_mode in ["icdf", "isf"]:
if kval < 0 or kval > 1:
return "Invalid input: probability must be between 0 and 1 for icdf/isf."
try:
if zipf_mode == "pmf":
result = scipy_zipf.pmf(kval, a_val, loc=loc_val)
elif zipf_mode == "cdf":
result = scipy_zipf.cdf(kval, a_val, loc=loc_val)
elif zipf_mode == "sf":
result = scipy_zipf.sf(kval, a_val, loc=loc_val)
elif zipf_mode == "icdf":
result = scipy_zipf.ppf(kval, a_val, loc=loc_val)
elif zipf_mode == "isf":
result = scipy_zipf.isf(kval, a_val, loc=loc_val)
else:
return "Invalid mode."
# Handle NaN
if math.isnan(result):
return "Result is NaN (not a number)."
return inf_to_str(float(result))
except Exception as e:
return f"Error in {zipf_mode} calculation: {str(e)}"
# Handle 2D list or scalar input
if isinstance(k, list):
# Validate 2D list structure
if not all(isinstance(row, list) for row in k):
return "Invalid input: k must be a scalar or 2D list."
result = []
for row in k:
result_row = []
for val in row:
out = compute(val)
if isinstance(out, str):
return out
result_row.append(out)
result.append(result_row)
return result
else:
return compute(k)